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Abstract — This work investigates a central problem in 
steganography, tliat is: How mucti data can safely be hidden 
without being detected? To answer this question, a formal 
definition of steganographic capacity is presented. Once this has 
been defined, a general formula for the capacity is developed. The 
formula is applicable to a very broad spectrum of channels due 
to the use of an information-spectrum approach. This approach 
allows for the analysis of arbitrary steganalyzers as well as non- 
stationary, non-ergodic encoder and attack channels. 

After the general formula is presented, various simplifications 
are applied to gain insight into example hiding and detection 
methodologies. Finally, the context and applications of the work 
are summarized in a general discussion. 

Index Terms — Steganographic capacity, stego-channel, ste- 
ganalysis, steganography, information theory, information spec- 
trum 



I. Introduction 

A. Background 

SHANNON'S pioneering work provides bounds on the 
amount of information that can be transmitted over a noisy 
channel. His results show that capacity is an intrinsic property 
of the channel itself. This work takes a similar viewpoint 
in seeking to find the amount of information that may be 
transferred over a stego-channel as seen in Figure [T] 

The stego-channel is equivalent to the classic channel with 
the addition of the detection function and attack channel. For 
the classic channel, a transmission is considered successful if 
the decoder properly determines which message the encoder 
has sent. In the stego-channel, a transmission is successful 
only if the decoder properly determines the sent message and 
the detection function is not triggered. 

This additional constraint on the channel use leads to the 
fundamental view that the capacity of a stego-channel is 
an intrinsic property of both the channel and the detection 
function. That is to say, the properties of the detection function 
influence the capacity just as much as the noise in the channel. 

B. Previous Work 

There have been a number of applications of information 
theory to the steganographic capacity problemQ], 111, 131. 
These works give capacity results under distortion constraints 
on the hider as well as active adversary. The additional 
constraint that the stego-signal retains the same distribution as 
the cover-signal serves as the steganalysis detection function. 
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Somewhat less work exists exploring capacity with arbi- 
trary detection functions. These works are written from a 
steganalysis perspective||4|, ||5l and accordingly give heavy 
consideration to the detection function. 

This work differs from previous work in a number of 
aspects. Most notable is the use of information-spectrum meth- 
ods that allow for the analysis of arbitrary detection algorithms 
and channels. This eliminates the need to restrict interest 
to detection algorithms that operate on sample averages or 
behave consistently. Instead, the detection functions may be 
instantaneous, meaning the properties of a detector for n 
samples need not have any relation to the same detector 
for 71 + 1 samples. Additionally, the typical restriction that 
the channel under consideration be consistent, ergodic or 
stationary is also lifted. 

Another substantial difference is the presence of noise 
before the detector This placement enables the modeling of 
common signal processing distortions such as compression, 
quantization, etc. The location of the noise adds complexity 
not only because of confusion at the decoder, but also because 
a signal, carefully crafted to avoid detection, may be corrupted 
into one that will trigger the detector 

Finally, the consideration of a cover-signal and distortion 
constraint in the encoding function is omitted. This is due 
to the view that steganographic capacity is a property of the 
channel and the detection function. This viewpoint, along with 
the above differences, make a direct comparison to previous 
work somewhat difficult, although possible with a number of 
simplifications explored in Section [V] 

C. Groundwork 

This chapter lays the groundwork for determining the 
amount of information that may be transferred over the chan- 
nel shown in Figure [T] Here, the adversary's goal is to disrupt 
any steganographic communication between the encoder and 
decoder To accomplish this a steganalyzer is used to detect 
steganographic messages and an attack function is used to 
corrupt undetected messages. 

We now formally define each of the components in the 
system, beginning with the random variable notation. 

1) Random Variables: Random variables are denoted by 
capital letters, e.g. X. Realizations of these random variables 
are denoted as lowercase letters, e.g. x. Each random variable 
is defined over a domain denoted with a script X. A sequence 
of n random variables is denoted with X" = (Xi, . . . , X„). 
Similarly, an n-length sequence of random variable realiza- 
tions is denoted x — (xi, . . . ,2:„) G A"". The probability of 
X taking value x e A" is px{x). 

Following a signal through Figure [U we begin in the 
space of ?7-length stego-signals denoted A"" . The signal then 
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Fig. 1. Steganographic Channel 




Fig. 2. Permissible and Impermissible Sets 



undergoes some distortion as it travels through the encoder- 
channel. This results in an element from the corrupted stego- 
signal space of y '\ Finally, the signal is attacked to produce 
the attacked stego-signal in space Z". 

2) Steganalyzer: The steganalyzer is a function g„ : 3^" 
{0, 1} that classifies a sequence of signals from 3^" into one of 
two categories: containing steganographic information and not 
containing steganographic information. The function is defined 
as follows for all y S iV", 

, , r 1, if y is steganographic 
^ ' \ 0, if y is not steganographic 

The specific type of function may be that of support vector 
machine or a Bayesian classifier, etc. 
A steganalyzer sequence is denoted as, 

g {5i,.92,ff3, • ■ •}, (2) 

where .g„ : 3^" {0, 1}. 

The set of all n length steganalyzers is denoted Qn- 

3) Permissible Set: For any steganalyzer (;„, the space of 
signals y" is split into the permissible set and the impermis- 
sible set. 

The permissible set Vg^ C 3^" is the inverse image of 
under gn, 

:= g-' ({0}) = {y e 3^" : ff„ (y) = 0}. (3) 

The permissible set is the set of all signals of 3^" that the 
given steganalyzer, g„ will classify as non-steganographic. 

Since each steganalyzer has a binary range, a steganalyzer 
sequence may be completely described by a sequence of 
permissible sets. To denote a steganalyzer sequence in such 
a way the following notation is used, 

^ = {Vi,V2,V-i,...}, 

where Vn Q 3^" is the permissible set for g„. 



4) Impermissible Set: The impermissible set Tg^ C 3^" is 
the inverse image of 1 under g„, 

^9. ■■= 9n' ({1}) = {y e 3^" : ff„ (y) = 1}. (4) 

For a given gn the impermissible set is the set of all signals 
in 3^" that g„ will classify as steganographic. 

Example 1: Consider the illustrative sum steganalyzer de- 
fined for the binary channel outputs (3^ = {0,1}). The 
steganalyzer is defined for y = (yi, . . . , y„) as. 

The permissible sets for n = 1, 2, 3, 4 are shown in Table HI 





TABLE I 




Sum Steganalyzer Permissible Sets 


V^ = 


{(0)} 


V2 = 


{(0,0),(0,1),(1,0)} 


■Ps = 


{(0,0,0),(1,0,0),(0,1,0),(0,0,1)} 


Vi = 


{(0,0,0,0),(1,0,0.0),(0,1,0,0),(0.0,1.0).(0.0,0,1), 

( 1 , 1 ,0,0),( 1 ,0, 1 ,0),( 1 ,0,0, 1 ),(0, 1 , 1 ,0),(0, 1 ,0, 1 ),(0,0, 1 , 1 )} 



5} Memoryless Steganalyzers: A memoryless steganalyzer, 
g = {.9ri}i5Li is one where each gn is defined for y = 

{yi,y2, ■ ■ ■,yn) as, 

, , _ f 1, if 3i e {1, 2, . . . , n} such that g{yi) = 1 
^"^^^ = \ 0, ifg(2/,;) = OV*e{l,2,...,n} 

(6) 

where g G Gi i& said to specify g„ (and g). To denote a 
steganalyzer sequence is memoryless the following notation 
will be used g = {g}. 

The analysis of the memoryless steganalyzer is motivated 
by the current real world implementation of detection systems. 
As an example we may consider each yi to be a digital image 
sent via email. When sending n emails, the hider attaches one 
of the j/i's to each message. The entire sequence of images 
is considered to be y. Typically steganalyzers do not make 
use of entire sequence y. Instead, each image is sequentially 
processed by a given steganalyzer g, where if any of the yi 
trigger the detector the entire sequence of emails is treated as 
steganographic. 

For a memoryless steganalyzer g„ defined by g, the per- 
missible set of Qn is defined by the n-dimensional product of 

" V ' 

n 



SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 



3 



D. Channels 

We now define two channels. The first models inherent 
distortions occurring between the encoder and detection func- 
tion, such as the compression of the stego-signal. The second 
models a malicious attack by an active adversary such as 
a cropping or additive noise. Both of these distortions are 
considered to be outside the control of the encoder. 

1) Encoder-Noise Channel: The encoder-noise channel is 
denoted as M^" where : ^ x A"" [0,1] and has the 
following property for all x e A"", 

H^"(y|x) :==Pr{y"=y|X"=x}. 

The channel represents the conditional probabilities of the 
steganalyzer receiving y S 3^" when x e A"" is sent. 

The random variable, Y resulting from transmitting X 

w 

through the channel W will be denoted as X . 

We denote an arbitrary encoder-noise channel as the se- 
quence of transition probabilities, 

2) Attack Channel: The attack function maps A" : y" 
Z" as, 

A"(z|y)=Pr{Z"=z|y"=y}. (8) 

The attack channel may be deterministic or probabilistic. 

Similar to the encoder-noise channel, we denote an arbitrary 
attack channel as the sequence of transition probabilities, 

A:={A^A^A^...}. 

3) Encoder- Attack Channel: The encoder-attack channel ov 
channel is a function Q" : A"" Z", defined to model the 
effect of both the encoder-noise and attack channel, 

Q" (z|x) = ^ A" (z|y) (y|x) . (9) 

yey" 

The specification of Q" by A" and VF" is denoted Q" = 

A" o W". 

The arbitrary encoder-attack channel is a sequence of 
transition probabilities, 

Q = {g\g2,Q3 ...}. (10) 

We will express the relation between the encoder-noise chan- 
nel, attack channel and encoder-attack channel as Q = AoW. 

4) Memoryless Channels: In the case where channel dis- 
tortions act independently and identically on each input letter 
Xi, we say it is a memoryless channel. In this instance the 
n-length transition probabilities can be written as, 

n 

W"{y\K) = l[w{y,\x,), (11) 

i=l 

where W is said to define the channel. To denote a channel 
is memoryless and defined by W we will write W = {W}. 



E. Encoder and Decoder 

The purpose of the encoder and decoder is to transmit 
and receive information across a channel. The information to 
be transferred is assumed to be from a uniformly distributed 
message set denoted A^„, with a cardinality of Mn. 

The encoding function maps a message to a stego-signal, 
i.e. /„ : Mn X"- The element of X" to which the ith 
message maps is called the codeword for i and is denoted, u^. 
The collection of codewords, C„ = {ui, . . . , ua/^} is called 
the code. The rate, i?„ of an encoding function is given as 

The decoding function, (j>n ■ -Z" — > A^„, maps a corrupted 
stego-signal to a message. The decoder is defined by the set of 
decoding regions for the each message. The decoding regions, 
I>i, . . . , 2?A/^, are disjoint sets that cover Z" and defined such 
that, 

0n'({™}) =2?m 

:= {F C Z" : (/.„(z) = m, V z € F} , 

for m = 1, . . . , Mn. 

Next, two important terms are presented that allow for the 
analysis of steganographic systems. The first is the probability 
the decoder makes a mistake, called the probabiUty of error. 
The second is the probability the steganalyzer is triggered, 
called the probability of detection. In both cases they are 
calculated for a given code C ~ {ui, . . . , um„}, encoder- 
channel W^, attack-channel and impermissible set Ig^ 
(corresponding to some g„)- 

The probability of error in decoding the message can be 
found as, 

where = A" o W^. 

Similarly the probability of detection for the steganalyzer is 
calculated as, 

A/„ 

5n^J^Y.^^^^9M- (13) 

^ 2—1 

F. Stego-Channel 

A steganographic channel or stego-channel is a triple 
(W, g, A), where W is an arbitrary encoder-noise channel, 
g is a steganalyzer sequence, and A is an arbitrary attack 
channel. To reinforce the notion that a stego-channel is defined 
by a sequence of triples we will typically write (W, g, A) = 

{(M/",g„,^")},T=i- 

1) Discrete Stego-Channel: A discrete stego-channel is one 
where at least one of the following holds: 

lA"! < oo, 13^1 < oo, |Z|<oo, or |7'g„ I < oo V?i. 

2) Discrete Memoryless Stego-Channel: A discrete memo- 
ryless stego-channel (DMSC) is a stego-channel where, 

1) (W,g, A) is discrete 

2) W is memoryless 

3) g is memoryless 

4) A is memoryless 
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A DMSC is said to be defined by the triple {W, g, A) and 
will be denoted (W, g, A) = {(W, g, A)}. 

G. Steganographic Capacity 

The secure capacity tells us how much information can 
be transferred with arbitrarily low probabilities of error and 
detection. 

An {n, Mn, e„, 5n)-code (for a given stego-channel) consists 
of an encoder and decoder. The encoder and decoder are 
capable of transferring one of Mn messages in n uses of the 
channel with an average probability of error of less than (or 
equal to) e„ and a probability of detection of less than (or 
equal to) (5„. 

1 ) Secure Capacity: A rate R is said to be securely achiev- 
able for a stego-channel (W, g, A) = {(H/", 5„, if 
there exists a sequence of (n, M„, e„, (5„)-codes such that: 

1) lim„^oo £« = 

2) lim„^oo Sn = 

3) liminf„^oo ^logM„ > R 

The secure capacity of a stego-channel (W, g, A) is de- 
noted as C(W, g, A). This is defined as the supremum of all 
securely achievable rates for (W,g, A). 

H. (e, S)-Secure Capacity 

A rate R is said to be (e, S)-securely achievable for a stego- 
channel (W,g,A) = if there exists a 
sequence of {n, Mn, £«, (5n)-codes such that: 

1) limsup„^oo en < e 

2) limsup„^oo 5n<S 

3) liminf„^oo ^ log M„>i? 

II. Secure Capacity Formula 
A. Information-Spectrum Methods 

The information-spectrum methodlD, Q, H), lH), |[TOl 
is a generalization of information theory created to apply 
to systems where either the channel or its inputs are not 
necessarily ergodic or stationary. Its use is required in this 
work because the steganalyzer is not assumed to have any 
ergodic or stationary properties. 

The information-spectrum method uses the general source 
(also called general sequence) defined as. 



X := 



, (14) 

L ) n—l 



where each is a random variable defined over alphabet 
X. It is important to note that the general source makes no 
assumptions about consistency, ergodicity, or stationarity. 

The information-spectrum method also uses two novel quan- 
tities defined for sequences of random variables, called the 
limsup and liminf in probability. 

The limsup in probability of a sequence of random variables, 
{Z„}5^2 is defined as, 

p-limsupZ„ := inf <^ a : lim Py{Z„ > a} = > . 



Similarly, the liminf in probability of a sequence of random 
variables, {ZnJ'^^i is, 

p-liminf Z„ := sup|/3 : lim Pr{Z„ < /3} = o} . 

L n — i-oo J 

The spectral sup-entropy rate of a general source X = 
{^"},T=i is defined as, 

HpCj := p- lim sup — log ■ 



Analogously, the spectral inf-entropy rate of a general 
source X = {X^}^^i is defined as. 



(X) : = p- lim inf i log . 



(16) 



The spectral entropy rate has a number of natural properties 
such as for any X, i?(X) > H{X.) > E Thm. 1.7.2]. 

The spectral sup-mutual information rate for the pair of 
general sequences (X,Y) = is defined as. 



/(X;Y) :=p-limsup-i(X";y"), (17) 

n — 'oo ^ 



where, 



:= log 



(18) 



Likewise the spectral inf-mutual information rate for the 
pair of general sequences (X,Y) = is 
defined as. 



/(X;Y) :=p-liminf-i(X";y"). 



(19) 



B. Information-Spectrum Results 

This section lists some of the fundamental results from 
information-spectrum theory IS) that will be used in the 
remainder of the paper. 



HJX.) < liminf 

n — >oc ji 

/(X; Y) < H{Y) - i?(Y|X) 
/(X; Y) > HiX.) - i?(Y|X) 



(20) 

(21) 
(22) 



C. Secure Sequences 

1) Secure Input Sequences: For a given stego-channel 
(W, g, A), a general source X {X"}^^^ is called (5-secure 
if the resulting Y = satisfies, 

limsupPr{g„(y") = 1} < 5, (23) 

n — ^cxD 

or either of the following equivalent conditions, 

lim sup {Ig„ ) < (5, (24) 

n — >oo 

or 

liminfpy.(P„J > l-(5. (25) 

n — ^oo 

The set, Ss, of all general sources that are (5-secure is 
defined as, 

55 := J X : lim sup ^ {Ig^ |x) px^ (x) < M . (26) 
where X = 

The set for 5 = is called secure input set and denoted Sq. 
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2) Secure Output Sequences: For a given steganalyzer 
sequence g = {gn}^=i, a general sequence Y = is 
called (5-secure if. 



limsupPr{c/„(y") = 1} < (5, 



(27) 



The set, Ts, of all (5-secure general output sequences is defined 
as. 



Ts:=\Y = : limsuppy.(XgJ <5\. (28) 

The set for (5 = is called secure output set and denoted Tq. 



with J {R + 7|X) < e shows that limsup„_^Q^ e„ < e. 
Finally since X G iSa we have that, 

limsupp^n (Xg„) < 5. 



(35) 



Converse: Let R > C, and choose 7 > such that R — 
27 > C. Assume that R is (e, (5)-achievable, so there exists 
an (n, M„, e„, (5,1 ) -code such that. 



lim inf — log A/„ > R, 

n^oo n 

lim sup e„ < e, 



(36) 
(37) 



D. {e,S)-Secure Capacity 

We are now prepared to derive the first fundamental result- 
the (e, (5)-Secure Capacity. This capacity will make use of the 
following definition, 

J(i?|X) :=limsupPr <R 



and 



= lim sup Pr { - log ^" < R 



The proof is the general e-capacity proof given by Han||6|, 
JT), with the restriction to the secure input set. 

Theorem 2.1 ({e, 6)-Secure Capacity): The (e, (5)-secure 
capacity C(e, (5|W, g, A) of a stego-channel (W,g,A) is 
given by, 

C(e, ,5| W, g, A) = sup sup {R : J (i?|X) < e} , (29) 

for any < e < 1 and < S < 1. 

Proof: This proof is based on 161, Q. Let C = 
supxe5, sup {R : J (i?|X) < e}, and Q" = A" o W. 

Achievability: Choose any e > and S > 0. 

Let R = C ~ 37, for any 7 > 0. By the definition of C we 
have that there exists an X e 5^ such that, 

sup{i? : J(i?|X) < e} > C-7 = i? + 27. (30) 

Similarly we may find an R' > i? + 7 such that J (i?'|X) < e. 
As J (i?|X) is monotonically increasing, 

J(i? + 7|X)<e. (31) 

Next by letting Af„ — e^^ we have that, 

lim inf — log A/„ > R. 

n — ^00 Ji 

Using Feinstein's Lemma lfTTI we have that there exists an 
(n, M„, e„)-code with. 



(Z"|X") 1 



e„<Pr<^-log _ 
As ;i- log il/„ = i? for all n we have. 



< -logA/„+7 



(32) 



.„<Pr|llog ^"(^.y") <i? + 7|+e-"^. (33) 

Taking the lim sup of each side we have, 

limsupe„ < J(i? + 7|X), (34) 



lim sup 5„ < 5. 



(38) 



Let X = {^"15^1 where each X" is a uniform distribution 
over codewords C„, and let Z be the corresponding channel 
output. Since i? - 27 > C > sup{i? : J (i?|X) < e}. 



J(i?-27|X) > e. 



(39) 



The Feinstein Dual ||6l, Q states that for a uniformly 
distributed input X" over a (n, M„, e„)-code and output Z" 
corresponding to channel Q, the following holds for all n, 

6„ > Pr ( i log ^"^^"'^"^ < - log M„ - 7 

(40) 

Using the property of lim inf we have that for all n > uq 
that, 

-logM„>i?-7. (41) 
n 

For n > no we have, 

e.>Pr(^log91i^^<R-2^-er-^. (42) 

Taking the lim sup of both sides, and considering ( |39] |. we 
see that, 

lim sup e„ > e. (43) 



A fundamental assumption in the above proof is that the 
encoder has a knowledge of the detection function. From 
a steganalysis perspective this allows one to determine the 
"worst-case scenario" for the amount of information that may 
be sent through a channel. 

E. Secure Capacity 

The next result deals with a special case of (e, (5)-secure 
capacity, namely the one where e = 5 = 0. The secure 
capacity is the maximum amount of information that may be 
sent over a channel with arbitrarily small probabilities of error 
and detection. 

The four potential formulations for our model are shown 
in Figure [3] The capacity of the stego-channel (W, g, A) is 
shown in Theorem 12.21 to follow and specialized to the other 
cases in Theorems 12.31 12.41 and 12.51 

The results of these capacities are summarized in Table 
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(W.g^A) 



ir"(y|x) 


i " 


9„(y) 




-i" (y|z) 







TABLE II 
Secure Capacity Formulas 



(•.g,A) 

A'" = K" 









A" (y|z) 















(W,g,.) 



(y|x) 




9~(y) 





(■,g.-) 



9„(y) 



Fig. 3. Stegochannels 



Theorem 2.2 (Secure Capacity): The secure capacity 
C(W,g, A) of a stego-channel (W,g, A) is given by. 



C(W,g,A) = sup /(X;Z) 



Proof: We apply Theorem 12.11 with e 
This gives, 

C(W,g,A) 
= C(0,0|W,g,A) 
= sup sup{i? : J(i?|X) < 0} 

XG5o 



(44) 

and (5 = 0. 



(45a) 
(45b) 



sup sup 
xe5o 



R : limsupPr <j Z"^) <R \ <Q 



sup i(X; Z) 

xeSo 



(45c) 
(45d) 



Here the last line is due to the definition of p- lim inf. ■ 

Theorem 2.3 (Noiseless Encoder, Active Adversary): The 
secure capacity of a stego-channel, (•, g, A), with a noiseless- 
encoder and active adversary, denoted C(-,g, A), is given 

by, 

C7(-,g,A)= sup /(Y;Z). (46) 

Ye To 



Proof: Apply Theorem O with X = Y and 5o = Tq. ■ 

Theorem 2.4 (Passive Adversary): The secure channel ca- 
pacity with a passive adversary, denoted C(W, g) of a stego- 
channel (W,g, •) is given by. 



C(W,g) = sup /(X;Y). 
xeSo 



(47) 



Proof: Since the adversary is passive, we have that Z = 
Y. ■ 

Theorem 2.5 (Noiseless Encoder, Passive Adversary): 
The secure capacity of a stego-channel (-jg, •), with a 
noiseless-encoder and passive adversary, denoted C(-,g), is 
given by, 

C(-,g)= sup /(X;Y). (48) 

xeSo 

Proof: Since the adversary is passive, we have that Z = 
Y, and since there is no encoder noise we have that X = Y 
and 5o — Tq. ■ 



Secure Capscity 


Noise 




inm. 


C(W,g, A) = sup /(X; Z) 
xeSo 


w 


A 


m 


C(-,g, A) = sup /(Y; Z) 
YSTo 


Noiseless 


A 


[23] 


C(W,g) = sup /(X; Y) 


W 


Passive 


[H 


C(-,g) = sup H(y) 

YGTo 


Noiseless 


Passive 


[23] 



F. Strong Converse 

A stego-channel (W, g, A) is said to satisfy the e-strong 
converse property if for any R > C(0, (5|W, g, A), every 
{n, M„, e„, (5„)-code with. 



and 



we have, 



liminf-logAf„ > R, 

n— »oo n 



lim sup (5„ < 5, 



lim e„ = 1. 

n — *oo 

If a channel satisfies the e-strong converse, 

C(e,(5|W,g,A) = C(0,5|W,g,A), 



(49) 



for any e e [0, 1). 

Theorem 2.6 (e-Strong Converse): A stego-channel 
(W, g, A) satisfies the e-strong converse property (for 
a fixed 5) if and only if. 



sup /(X;Z) = sup /(X;Z). 



(50) 



This proof is essentially the e-strong converse IS), Q with a 
restriction to the secure input set. See details in Appendix [A] 

G. Bounds 

We now derive a number of useful bounds on the spectral- 
entropy of an output sequence in relation to the permissible 
set. These bounds will then be used to prove general bounds 
for steganographic systems and see further application in 
Chapter Hn] 

Theorem 2.7 (Spectral inf-entropy bound): For a discrete 
g = {Vn]'^=i with corresponding secure output set Tq, 



sup HAY) = liminf - log \Vn\ 



(51) 



See Appendix [Bj for proof. 

Theorem 2.8 (Spectral sup-entropy bound): For discrete 
g ~ {'PnjJJ^i with corresponding secure output set Tq, 

sup H{Y) ^ lim sup - log {Vnl (52) 

YeTo n^oo n 

See Appendix |C] for proof. 
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H. Capacity Bounds Here the final line follows since if X e 5o and X Y then 

This section present a number of fundamental bounds on ^ ^ * 

the secure capacity of a stego-channel based on the properties The next corollary specializes the above theorem when the 

of that channel. permissible set is finite. 

We make use of the following lemma, Comllary 2.1 (Discrete Permissible Set Bound): 

Lemma 2.7.- For a stego-channel (W, g, A) the following For a given discrete stego-channel (W,g,A) 

hold, {(^"7^3„7^")}j^Li the secure capacity is bounded 



/(X;Z)</(X;Y), (53) 



from above as. 



/(X;Z) < /(Y;Z). (54) C(W, g, A) < limsup - log |PgJ (61) 

Proof: We note that the general distributions form a n r u- • t-. ptoI a ^■ r^mx c t-u 

°m Proof: Combinmg Theorem 12.81 and line ( |59b| ) of Theo- 

Markov chain, X ^ Y ^ Tyv A property of the inf- prrrn • *u j • j i. _ 

_ . f f J j-em 12.101 gives the desired result. ■ 

in ormation rate||_| is, jj^^ next theorem provides an intuitive result dealing with 

/(X; Z) < /(X; Y), (55) the capacity of two stego-channels having related steganalyz- 

when X ^ Y ^ Z. 

Since X ^ Y ^ Z implies Z ^ Y ^ X we also have. Theorem 2.11 (Permissible Set Relation): For two stego- 
channels, (W, g, A) and (W, v, A) if Vg^ C V-u^ for all 

i(X; Z) < /(Y; Z). (56) but finitely many n, then, 

■ C(W,g,A) < C(W,v,A). (62) 
The first capacity bound gives an upperbound based on the 

sup-entropy of the secure input set. Proof: Let {/„}^i and {(j)n}^=i be a sequence of 

Theorem 2.9 (Input Sup -Entropy Bound): For a stego- encoding and decoding functions that achieves C(W,g,A). 

channel (W, g, A) the secure capacity is bounded as. Such a sequence exists by the definition of secure capacity. 



r^/iiT * \ ^ -rp/-v\ /^--is The following definitions will be used for i = 1, . . . , M„, 

C(W,g,A)< sup ff(X) (57) ® ■, 1 n, 

""^^^ _ u, = U€), 

Proof: Using ( |2TI ) and the property that //(X|Z) > we 2? — ({*}) 



have, 

. N irfTTn r/^r r.\ Thc probability of error for this sequence is given by (fT2l i. 

6 (W, g, A) sup /(X; Z) r j i b j 

Xe5o ^ A/,. 

f sup{i?(X)-i?(X|Z)} ^.-^EQ"(^>0, 

xe5o ^=1 

< sup i?(X) where g" = A" o W'\ 

'^^Sq Xhls value is Independent of the permissible sets and if 

■ e„ ^ for the stego-channel (W, g, A) then it also goes 

The next theorem gives two upper bounds on the capacity to zero for (W,v, A), 

based on the sup-entropy of the secure input and output sets. Next we know that the probabihty of detection for 

Theorem 2.10 (Output Sup-Entropy Bounds): For a stego- (W, g, A) is given by ( fTSI ). 
channel (W, g, A) the secure capacity is bounded as, 

C(W, g, A) < sup H{Y) (59a) ^ J_ V W^" |u,;) , 

<sup^(Y) (59b) that ^ 0. 

, , — , , , Since Vn C Vv for all n > N, we have that, Xg 3 

Proof: Using Ul} and the property that i?(Z|X) > we jf > ^nd 

have, 

C{W, g, A) = sup /(X; Z) ^" ^ ^" (^-- ' Vn > iV, X e A-". (63) 

Using this, we may bound the probability of detection for 



sup /(X; Y) (W, V, A) and n > as, 

XGSo 



fsup {H{Y) - H{Y\X)} 1 |^i^»(X,JuO 

< sup H{Y) ^ , M„ 

< sup77(Y) -M„^ ^ " 



YeTo 



1=1 



'X ^ Y — > Z is said to hold when for all n, X" and Z" are conditionally 

independent given Y". Since 5^ -> we see that SI as well. 
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/. Applications 

1 ) Composite steganalyzers: The final theorem of the previ- 
ous section is intuitively pleasing and leads to some immediate 
results. An example of this is the composite steganalyzer 
pictured in Figure |4] 

In this system, two steganalyzers, g and v are used sequen- 
tially on the corrupted stego-signal. If either of these stegana- 
lyzers are triggered, the message is considered steganographic. 
We will denote the composite stego-channel of this system as 
(W,h, A). 

As one would expect, the capacity of the composite 
channel, C(W, h, A), is smaller than either C(W,g,A) or 
C(W,v, A). This is shown in the next theorem. 

Theorem 2.12 (Composite Stego-Channel): For a compos- 
ite stego-channel (W, h, A) defined by g and v, the following 
inequality holds, 

C(W, h, A) < min {C(W, g, A), C(W, v, A)} . (65) 

Proof: We first show that C(W, h, A) < C(W, g, A). 
The permissible set of the composite is equal to the inter- 
section of the base detection functions, 

^/i„ =^3„ nn„, Vn, (66) 

thus we have that Vhr, ^ ^nd we may apply Theorem l2.11l 
to state, 

C(W,h,A)<C(W,g,A). 

The above argument may be applied using Vh„ Q Vv,^ to 
show C(W,h,A) < C(W,v,A). " 

2) Two Noise Systems: We briefly present and discuss an 
interesting case that is somewhat counter-intuitive. Consider 
the channel shown in Figure |5] In this case there is distortion 
A after the encoder and a second distortion B before the 
second steganalyzer. In the previous section it was shown 
that in the composite steganalyzer the addition of a second 
steganalyzer (Figure |5]) lowers the capacity of the stego- 
channel. A surprising result for the two noise system is that 
this may not be the case. In fact, the addition of a second 
distortion may increase the capacity of a stego-channel! 



To see this, consider the two steganalyzers g and v. Assume 
that g classifies signals with positive means as stegano- 
graphic, while V classifies signals with negative means as 
steganographic. If these detection functions were in series, 
the permissible set (of the composite detection function) 
is empty. This is because a signal cannot have a positive 
and negative mean. Now consider a specific, deterministic 
distortion y|y) = 1. Now we may send any signal we 
wish, as long as its mean is positive. So in some instances, it 
is possible for the addition of a distortion to actually increase 
the capacity. 

III. Noiseless Channels 

This section investigates the capacity of the noiseless stego- 
channel shown in Figure |6] In this system there is no encoder- 
noise and the adversary is passive. This means that not only 
does the decoder receive exactly what the encoder sends, but 
the steganalyzer does as well. 

This section finds the secure capacity of this system, and 
then derives a number of intuitive bounds relating to this 
capacity. 



A. Secure Noiseless Capacity 

Theorem 3.1 (Secure Noiseless Capacity): For a discrete 
noiseless channel (-jg, •) the secure capacity is given by. 



C(-,g) =liminf-log|7'gJ 

ri— ►cxD n 



(67) 



and 



Proof: The proof follows directly from Theorem [ 
Theorem 12.71 ■ 
Example 2 ( Capacity of the Sum Steganalyzer): We now 
use this result to find the secure noiseless capacity of the 
sum steganalyzer of Example [T] The size of the permissible 
set for n is equal to the number of different ways we may 
arrange up to \n/2\ Is into n positions. 



i:0<t<[fj 



(68) 



For n even YPn 



= 2"-i + ^ n/2 j " 
\Vg^ \ = 2"^^. Applying the noiseless Theorem, 

C(-,g) = liminf-log|-p„„| = lim -log2""i 

n— >oo 77, " n— >oo 77, 

= Ibit/use. (69a) 
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B. e-Stmng Converse for Noiseless Channels 

We now present a fundamental result for discrete noiseless 
channels regarding the e-strong converse property. It gives 
the necessary and sufficient conditions for a noiseless stego- 
channel to satisfy the e-strong converse property. 

Theorem 3.2 (Noiseless e-Strong Converse): A discrete 
noiseless stego-channel (-jg, •) satisfies the e-strong converse 
property if and only if. 



C(-,g)= nm^-log|7'<,„ 



(70) 

Proof: Since the channel is noiseless, X = Y = Z we 



have. 



sup I(X;Z) = sup H{Y), 
xeSo ygTo 

sup T(X;Z) = sup H{Y). 

XG5o YeTo 



(71) 
(72) 



First assume that the stego-channel satisfies the e-strong 
converse property. This gives. 



sup if (Y)"^ sup /(X;Z) 
YeTo xeSo 

■^sup I(X;Z) 

XfESo 



I sup H{Y) 

YeTo 



(73a) 
(73b) 
(73c) 



The capacity is then. 



C{;g)^sup H{Y) 



YgTc 



lim inf — log \Vg 



^ sup H{Y) 



YeTo 



1 



lim sup — log \Vg^ I 

n — *oo 

= lim -log\VgJ 

n— *oo Ti 

Here the final line results as the lim inf and lim sup coincide. 

For the other direction assume that C(-,g) = 
lim„^oo ^ log \Vg^ I which gives, 

C{-,g) = sup /(X;Z) 
xe5o 

^ sup S(Y) 

YeTo 

= lim -loglT'gJ 

n — >oo Ji 

= lim sup — log \Vg^ I 

n — ^oo ^ 
l(T2.S)| Tr/-v/-\ 

'■^ sup H[Y) 

YeTo 

^ sup 7(X; Z) 

xe5o 

Thus, supxg5„ I(X; Z) = supxgs,, /(X; Z) and by The- 
orem 12.61 the stego-channel satisfies the e-strong-converse 
property. ■ 

Example 3 (Sum Steganalyzer): We now determine if the 
sum steganalyzer satisfies the e-strong converse. 



From Example |2] the size of the permissible set is. 



2"-i + 1 j , for even n 

2"-\ for odd n 

We will make use of Stirling's approximation, 

where l/(12n+l) < A„ < l/(12n). 
For n even, 

,„ , 1 1 nl 



27rn 



< 2" 



This gives, 
1 



/27r(n/2)^ + 2e-^ 



2e 



'2Tm 



(75) 
(76) 

(77) 
(78) 

(79) 



lim sup — log \ Vg 



< lim sup — log ( 2 

n — ^oo IT' 



Ti-1 



2e 



'27rn 



This shows, 

lim inf — log \V„ | ~ 1 > lim sup — log \V„ \ 

n^oo n n->oo n 



(80) 
(81) 

(82) 



Since the liminf and limsup coincide, the limit is indeed a true 
one and this stego-channel satisfies the e-strong converse. 

C. Properties of the Noiseless DMSC 

In this section we briefly investigate the secure capacity of 
the discrete memoryless stego-channel (cf. II-F2b . 

Theorem 3.3 (Noiseless DMSC Secure Capacity): For the 
stego-channel (-jg, •) with g = {g}, the secure capacity is 
given by, 

C{-,g)^log\rg\, (83) 

and furthermore this stego-channel satisfies the strong 
converse. 

Proof: As the channel is noiseless and the input alphabet 
is finite we may use Theorem 13. II 



C(-, g) ~ liminf — log \Vg„ 

n — ^oo Ji 

Note that by (|7]i we have for all n. 



(84) 



-log|^s„l = - log 
n n 



VgxVgy.---y.Vg 



-^Og\Vg\ 

log IT', I . 
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Thus, 



We also have that 



C(.,g)=log|7', 



(85) 



g) = liminf i log \Vg^ I = log jT'gl = lim - log \Vg„ \ , 

(86) 



thus by Theorem 13.21 the stego-channel satisfies the strong 
converse. ■ 

IV. Additive Noise Stego-Channels 

In this section we evaluate the capacity of particular stego- 
channel, shown in Figure |7] In this channel, both the encoder- 
noise and attack-noise are additive and independent from the 
channel input. 

A. Additive Noise 

Denote the sum of two general sequences X = 

{X" = and Y ^ {Y" = 

as. 



X + Y :^ + = (X[ 



,^i"^+i"i"')};r=i- 

(87) 

Letting the encoder-noise be denoted as Ne = {N"}'^=i 
and the attack-noise denoted as Na ~ {^"Ir^^i have the 
following relations, 

Y = X + Ne 

Z = Y + = X + Ne + N„ = X + N 

where N = {A^"},f=i = N^ + N^. 

As noises are independent from the stego-signal, we may 
use the following simplifications, 

leading to the following simphfications in spectral-entropies, 

i£(Z|X)=i£(N), (88) 
77(Z|X) =i?(N). (89) 

We now use these simplifications to present a useful capac- 
ity result for additive noise channels. 

Theorem 4.1: For additive noise stego-channel defined with 
Ne+Na = N, if N satisfies the strong converse (i.e. HjJ>i) — 
HCN)) then the capacity is, 

C(W, g, A) = sup {^(Z)} - ^(N) (90) 

xe5o 



Proof: First we find a lower bound as, 

C(W,g,A)f sup {^(Z)-i?(Z|X)} (91) 

XGSo 

sup {H{Z)} - H{N) (92) 

XfESo 

Next we upperbound the capacity as, 

C(W,g,A)f sup {^(Z)-^(Z|X)} (93) 
sup {^(Z)} - HiN) (94) 

Xg5o 

By assumption ii£(N) ~ H(N) and combining (|92] ) 
and ( |94] i we have the desired result. ■ 

B. AWGN Example 

The general formula of the previous section is now ap- 
plied to the commonly found additive white Gaussian noise 
channel. The detector is motivated by the use of spread 
spectrum steganography lfT2l . or more generally stochastic 
modulation lfT3l . 

The encoder-noise and attack-channel to be considered are 
additive white Gaussian noise (AWGN). For a stego-signal, 
X = {xi, . . . , Xn), the corrupted stego-signal is given by, 

y = (xi + ni, . . . ,Xn + Un), 

where each rii ~ AfiOja"^), and all are independent. 

The transition probabilities of the encoder-noise are given 

by, 



VF" (y|x) 



■ exp 



{-^E(y'--')'}- (95) 



Similarly, the attack-channel is AWGN as A/'(0, cr^) so the 
transition probabilities are, 



A" (z|y) 



^ cxp 



(. « i=l 



(96) 



1) Variance Steganalyzer: In stochastic modulation, a 
pseudo-noise is modulated by a message and added to the 
cover-signal. This is done as the presence of noise in signal 
processing applications is a common occurrence. 

If the passive adversary has knowledge of the distribution 
of the cover-signal and suspects stochastic modulation, they 
would expect the variance of a stego-signal will differ from 
a cover-signal. If the passive adversary knows the variance 
of the cover-distribution, they could design a steganalyzer to 
trigger if the variance of a test signal is higher than expected. 
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For example when testing the signal y = {yi, 
variance steganalyzer operates as. 



5>i(y) = 



0, else 



,yn) the 



(97) 



That is to say, if the empirical variance of a test signal is 
above a certain threshold, the signal is considered stegano- 
graphic. 

2) Additive Gaussian Channel Active Adversary: In this 
section we derive the capacity under an active adversary. 
Assume that the adversary uses an additive i.i.d. Gaussian 
noise with variance cr^ while the encoder noise is additive 
i.i.d. Gaussian with a^. 

Let Ne = {NeB where Ne - J^{0,(jI) and = {No} 
where iVa-AA(0,(T2). 

Let N = Ne + Na = {iV" = A^P + Since both 

Ng and are i.i.d. as A/'(0, cr^) and A/'(0, cr^), respectively, 
their sum is i.i.d. as AA(0, al + al), i.e. N = {N} with N - 

Since N = {N} with N ^ 7V(0, al + a^) we have the 
following relations, 

H{N) = H{N) = H{N) = ^ log 27re {a^ + al) . (98) 

Since H_(N) = H{N) we see that the noise sequence 
satisfies the strong converse property. 

3) Active Adversary Capacity: We now derive the secure 
capacity of the above stego-channel. Since the noises are i.i.d., 
the general sequence N will satisfy the strong converse and 
allow the use of Theorem 14.11 

The formal proof is then followed by a discussion of the 
results and a description using the classic sphere packing 
intuition. 

Theorem 4.2: For the stego-channel (W,g,A) = 
{(Vr",g„,A")}^^i with ly" and defined by (|95]l 
and ( |96] l respectively, and g„ defined by ( |97l ) the secure 
capacity is. 



C(W,g,A) = ilog4^. 
Proof: From Theorem 14.11 and (|98] l we have. 



(99) 



(100) 



C(W,g,A)= sup {^(Z)}-S(N) 

= sup {^(Z)} - \ log2^e [al + al) . (101) 

Achievability: 

Let X = {X} where X ~ 7V(0, c-al). Thus Y = X+Ne = 
{y} with Y = X+iVg. By addition of independent Gaussians, 
Y ~ 7V(0, c). This gives. 



I 1=1 



> c 



0, 



(102) 



-Recall that for a general sequence, X = {X" = {X 



(n) 



when X = {X} is written it means that each X^ is independent and 
identically distributed as X. 



and we see that X G 5o. Similarly, Z = + Y = {Z} 
with Z = X + Ne + Na- Again by addition of independent 
Gaussians we have Z ^ JV{0, c + al). 
This allows for a lower bound of. 



C(W, g, A) sup ^(Z) - ± log {2TTeial + a^) (103a) 



1 



xe5o 



>S(Z) - - log {2TTeial + a^)) (103b) 



1 , c + ai 
= o log- 



(103c) 



Converse: 

To find the upperbound we will make use of a number of 
simple lemmas: 

Lemma 4.1: For a given stego-channel with secure input 
distribution set Sq and secure output distribution set Tq, the 
following holds. 



sup H{Z) < sup H{Z). 

Xe5o YgTo 



(104) 



w 



Proof: By definition for any X G 5o and X ^ Y, we 
have Y eTq. ■ 
Lemma 4.2: For y" = {yI"\y^"\ . . . ,Yjr'^) let K^f 
be the covariance between y^"' and y/"\ that is A'f"^ := 
y-(n)y^(") 1 p^j. jj^g stego-channel defined above, if Y = 

{^"Iti^i G ^ we have for any 7 > there exists some N 
such that for all n > N, 



1 " 

i;Y.^^ + -a<c + al 



+ 7- 



(105) 



Proof: It suffices to show. 



1 " 



(n) 



< C + 7, 



(106) 



for all 71 greater than some N . 

To show this, assume that no such N exists, thus we have 
a subsequence nt such that. 



> c + 7. 



(107) 



This means that. 



c + 7, 



Z— 1 2—1 J 

which in turn impHes that, 

Pr{.g„,(y"^) = 0}^0. 



This is a contradiction as it shows Y = {y"}^i ^ "To- ■ 
Lemma 4.3: For any = with = 

Cn \ " 
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Proof: From lfT4l Chap. 9.6] we have, 

1 " 
< -log(2^ern^»- (109) 

i=l 

The resuh follows from application of the arithmetic-geometric 
inequality. ■ 
Lemma 4.4: For the above stego-channel, any Y £ Tq and 
any e > we have, 

liminf-77(Z") < ^ log 27re(c + ct^) + e> (HO) 

n— >oo n 2 

where Z {Z'^}n=i and Y A Z. 

Proof: Let any e > be given and choose 7 > such 

that, 

7<(c + aD(e2'-l), 

this gives, 

i log 27re (c + cr^ + 7) < i log 27re(c + ctD + (1 1 1) 



TABLE III 
Gaussian Additive Noise Capacities 



Letting C'f^ = E {zf'^z]"'} and = E 

(112) 



we note that = k/"' + Na- This gives 



This gives. 



1 |iL4l^ 1 

n An 



1(112)1 



llog(2.er(i|:i4n.^^ 



^i^log(2.er(c + a^+7)' 

mum o 

T^^log27re(c + <) + e 



(113) 

(114) 

(115) 
(116) 



The inequality of jl 15l l holds for all but a finite number of n 
by Lemma 14.21 ■ 

We now show the upperbound: 

Beginning with the specialization of Theorem 14.11 

C(W,g,A)'iPsup {^(Z)}-ilog27re(a2+a^) (117a) 

X65o ^ 



V sup {^(Z)} - - log 2^6(^2 + al) 
YeTo 2 



] 1 

sup liminf-i7(Z") 
YeTo n 

-ilog2^e((72+a2) 

|(L«)]1 C + CT^ , 

^- log ^ + e 



Combining (I103cl i and dl 17dl ) we have for any e > 0, 
1 , c + al , „ 1 c + al , 



(117b) 

(117c) 
(117d) 



o l°S 2 I 2 
2 crj + cr,^ 



^ <C(W,g,A) < -log- 



and we see that C(W, g, A) = i log 



Channel 


Secure Capacity 


Encoder Noise 


Attack Noise 


C(W,g,A) 




-I 


-I 


C(W,g) 









C(-,g,A) 







-I 













4j Ato/ie Cfliei.- We now use this theorem to investigate 
the behavior of the capacity under different noise conditions. 

5) Large Attack Case: We first consider the case where ct^ 
is much larger than both c and ct^. This gives, 

C(W, g, A) = i log 4^ « i log 4 = 0. 

This shows that when the attack noise is large enough, the 
capacity of the stego-channel goes to zero. Intuitively this is 
due to the fact that the variance steganalyzer places a power 
constraint (of c) on any signals it allows to pass. If the attack 
noise is much larger than c, a message simply cannot be 
transmitted with enough power to overcome that noise and 
e„ ^ is impossible. 

6} Large Encoder-Noise Case: Next we consider the case 
where cr^ > c. 

Since 3? !^3? < 1, we have log ^2 ^"2 < 0. This gives. 



C(W,g,A) = ilog4: 
2 at 



< 



As capacity is always greater or equal to zero, we see that 
the capacity of this system is indeed zero. This is because no 
matter what codeword is sent, the encoder-noise will corrupt 
it into the impermissible set and the steganalyzer will be 
triggered, that is (5„ is impossible. 

This case illuminates the importance of the additional 
constraint in communication over a stego-channel, as even if 
e ^ the capacity of the stego-channel is still zero. 

7) Noiseless Case: Consider the noiseless case where = 
0. This gives, 

^2 

= 00. 



lim C(W, g, A) = lim i log ^-T""! 



Since the channel is noiseless and the permissible set size is 
infinite (as well as input and output alphabets), the capacity 
is unbounded. 

8) Geometric Intuition: In this section we present some 
geometric intuition to the previous results, similar to the case 
of the classic additive Gaussian noise lfT4l . ifTSl . 

We will consider the case of only an encoder-noise of a^, 
shown in Figure |9] 

From the above theorem we see that. 



C(W,g) = ilog4. 

2 (T^ 



(118) 



The most basic element will be the volume of an n 
dimensional sphere of radius r. In this case, the volume is 
equal to A„r" where An is a constant dependent only on the 
dimension n. 
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If we consider the capacity as C(W, g) = limilogA/„ 
we have. 



C(W,g) =limllog(4)' 



which agrees with the result of Theorem 14.21 



(120a) 
(120b) 



The fundamental question is: what is the capacity of the 
stego-channel, or how many codewords can we reliably and 
safely use? To answer this, we must consider the two con- 
straints on a secure system: error probability and detection 
probability. 

9) Error Probability: Since we have that A"" = 3^" = 3?", 
we may view each codeword as a point in 3?". When we 
transmit a given codeword, we may think of the addition of 
noise as moving the point around in that space. As the power 
of the noise is cr^, the probability that the received codeword 



has moved more than 
to zero as n 



r2 away from where it started goes 
This means a received codeword will 



likely be contained in a sphere of radius 



centered on 



the transmitted codeword. If we receive a signal inside such 
a sphere, it is likely that the transmitted codeword was the 
center of that sphere. In this manner we can define a coding 
system by choosing the codewords such that their spheres do 
not overlap. This results in no confusion during decoding and 
achieves the requirement of vanishing error probability. 

10) Detection Probability: We begin by looking at the 
permissible set. The permissible set for our g„ is given by. 



P3„={yey':5;]y.?<nc}. 



(119) 



Clearly the permissible set is a sphere of radius ^/nc centered 
at the origin. If a test signal falls inside this sphere it is 
classified as non-steganographic, whereas if it is outside it is 
considered steganographic. 

The second criteria for a secure system is that the probability 
of detection goes to zero. If we were to place each codeword 
such that its sphere was inside the permissible set, we know 
that the probability of detection will go to zero. 

11) Capacity: From the above, we know that the codeword 
spheres cannot overlap (to ensure no errors). We also know 
that all the codeword spheres must fit inside the permissible 
set (to ensure no detection). If we calculate the number of non- 
overlapping spheres we may pack into the permissible set, we 
will have a general idea of the number of codewords we can 
use. 

Since the volume of the permissible set is An (nc) ^ and the 
volume of each codeword sphere is A„(na^)5, we can place 
approximately. 



non-overlapping sphere inside the permissible set. 

Using the center of each sphere as a codeword, we have 
Mn codewords where M„ — (-^)^. 



V. Previous Work Revisited 
A. Cachin Perfect Security 

In Cachin's definition of perfect security|fT6l 



the cover- 
signal distribution and the stego-signal distribution are each 
required to be independent and identically distributed. This 
gives the following secure-input set. 



5o = <! X = {X} : lim -D (S'"||X") = 

n — >oo Ji 



(121) 



The i.i.d. property means that D {S"\\X'') = nD{S\\X) 
so we see that the above is equivalent to. 

So = = {X} : D {S\\X) = 0} (122) 
= {X = {X} : PS ^ px} (123) 

Since Cachin's definition does not model noise, we may 
consider it as noiseless and apply Theorem 13.11 



C(W,g) 



sup H{X) 



H{S). 



(124) 



This result states that in a system that is perfectly secure (in 
Cachin's definition), the limit on the amount of information 
that may be transferred each channel use is equal to the entropy 
of the source. This is intuitive because in Cachin's definition 
the output distribution of the encoder is constrained to be equal 
to the cover-signal distribution. 

B. Empirical Distribution Steganalyzer 

The empirical distribution steganalyzer is motivated by the 
fact that the empirical distribution from a stationary memory- 
less source converges to the actual distribution of that source. 
Accordingly, if the empirical distribution of the test signal 
converges to the cover-signal distribution it is considered to 
be non-steganographic. 

Assume that ps is a discrete distribution over the finite 
alphabet S. Let a sequence, {s"}^i with each s" S 5" be 
used to specify the steganalyzer for a test signal x as. 



if P[,.] = P[,] 

1 if P[,„] ^ P[^] 



(125) 



where is the empirical distribution of x. 

The permissible set for 5„ is equal to the type class of P[s"], 
i.e., 

Pg„ = T(P[,„]) := {x e A-" : P[,] = P[,„] } . (126) 
Theorem 5.1 (Empircal Distribution Steganalyzer Capacity): 



C(W,g) = ff(5). 



(127) 
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Fig. 10. Moulin Stego-channel 
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Proof: Theorem O shows C(W,g) = iJ (5). 
We now show MouHn's capacity is equal to this value. In 
the case of a passive adversary {D2 — 0), the following is the 
capacity of the stego-channelQ, 

^ST£G(^^^0)= sup H{X\S) 
Q'eQ' 



(130) 



where a p e Q' is feasible if, 

Y,P(.x\s)psis)dis,x) <Di, (131) 



and 



Source 


AI » 


/» ('"■ s) 






.9,.(y) 


^' 


X = Y 



Fig. 11. Equivalent Stego-channel 



Proof: Since the channel is noiseless we may apply 
Theorem 13.11 

C(W, g) = liminf - log |P„„ I (128a) 

n — *oo Ji 

= liminf - log |r(s")| (128b) 

= H{S) (128c) 

Here we have used the fact that the permissible set for 
the empirical distribution detection function is the type 
class in ( |128bt . Additionally, by Varadarjan's Theorem lfTTl . 

Ps{x) almost surely (here the convergence is 
uniform in x as well). This allows for the use of the type 
class-entropy bound from Theorem ID. II that provides the final 
result. ■ 



C. Moulin Steganographic Capacity 

Moulin's formulation 121, ||3| of the stego-channel is shown 
in Figure [TO] This is somewhat different than the formulation 
shown in Figure [T] most notable is the presence of distortion 
constraints and an absence of a distortion function prior to 
the steganalyzer Additionally, an explicit steganalyzer is not 
defined and a hypothetical X ~ ps is used. In order to have 
the two formulations coincide a number of simplifications are 
needed for each model. 
For our model, 

• The stego-channel is noiseless 

> The steganalyzer is the empirical distribution 
For Moulin's model, 

• Passive adversary (D2 = 0) 

• No distortion constraint on encoder (£>i = 00) 

These changes produce the stego-channel shown in Fig- 
ure E] 

Theorem 5.2: For the stego-channel shown in Figure (TT] 
the capacities of this work and Moulin's agree. 



The capacity can be found for unbounded Di as, 

^■5^^^(00,0)= sup H{X\S) 

p{x\s)eQ' 



(132) 



(133a) 



^H{S)- mill I{S;X) (133b) 

p{x\s)eQ' 

= H{S) (133c) 
where the final line comes from choosing p{x) — Ps{x)- ■ 

VI. Conclusions 

A framework for evaluating the capacity of steganographic 
channels under an active adversary has been introduced. The 
system considers a noise corrupting the signal before the 
detection function in order to model real-world distortions 
such as compression, quantization, etc. 

Constraints on the encoder dealing with distortion and a 
cover-signal are not considered. Instead, the focus is to develop 
the theory necessary to analyze the interplay between the chan- 
nel and detection function that results in the steganographic 
capacity. 

The method uses an information-spectrum approach that 
allows for the analysis of arbitrary detection functions and 
channels. This provides machinery necessary to analyze a very 
broad range of steganographic channels. 

In addition to offering insight into the limits of performance 
for steganographic algorithms, this formulation of capacity can 
be used to analyze a different and fundamentally important 
facet of steganalysis. While false alarms and missed signals 
have rightfully dominated the steganalysis literature, very little 
is known about the amount of information that can be sent past 
these algorithms. This work presents a theory to shed light 
onto this important quantity called steganographic capacity. 

Appendix A 
e-SxRONG Converse Proof 

A stego-channel (W, g, A) satisfies the e-strong converse 
property (for a fixed 5) if and only if. 



sup /(X; Z) = sup /(X; Z) 



(A. 134) 



C(W,g)=C^^^^(oo,0) = i/(5), 



(129) 



Proof: First assume sup^g^^ Z) = 
supxe5,T(X;Z). Let R = C(0, (5|W, g, A) +37 with 
7 > 0. Consider an (n, A/„, e„, (5„)-code with, 

liminf — logA/„ > R, 

n — >oo Ji 
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and 



limsup(5„ < S. 



Let X represent the uniform input due to this code and Z the 
output after the channel Q = AX. From the Feinstein Dual 
161, Q we know, 

e„ >Pr(-i(X";Z") < - logM„ - 7I - e""^. (A.135) 
[n n J 

We also know there exists uq such that for all 71 > tiq that, 
1 



■logM„ > i?-7, 



so for n > no. 



(A. 136) 



e« > Pr|-i(X";Z") < i? - 27| - e""''. (A.137) 

We now show that the probability term above tends to 1. 
Using Theorem 12.21 we have, 

R =C(0,(5|W,g,A)+37 (A.138) 
= supxes. Z) + 37 (A 139) 

= supxe5j(X;Z)+37 (A 140) 



Rewriting gives. 



i? - 27 = sup /(X; Z) + 7. (A.141) 

xeSs 



By the definition of /(X; Z) we finally have. 



1 



lim Pr <^ -i{X"; Z") < i? - 27 ^ = 1, (A.142) 



which together with I A. 1 37] shows that that lim„__>oo = 1- 
For the other direction assume. 



lim e„ 1, 



(A. 143) 



Substituting we have that, 

supJ(X;Z)< i? + 7 (A. 149) 

= C(0,(5|W,g,A) + 27 (A.150) 
= supxe5, Z) + 27 (A151) 

As 7 is arbitrarily close to we have, 

sup 7(X;Z) < sup /(X;Z). (A.152) 

xeSs xeSs 

Also, by definition, 

sup 7(X;Z) > sup /(X;Z), (A.153) 

showing equality and completing the proof. ■ 

Appendix B 
Spectral inf-entropy bound 

For a discrete g = {Vn}^^! with corresponding secure 
output set To, 

sup fl:(Y) = liminf - log \Vn\ 
YeTo n 

Proof: Let U {A) represent the uniform distribution on a 

set A. 

Since Y* = {U{Vn)}tLi € % we have, 

sup ^(Y) > H{Y*) = liminf - log |P„| (B.154) 

Now assume there exists Y e Tq with Y = such 
that, 

H{Y)=H{Y*) + i-i, (B.155) 

for any 7 > 0. 
This means that. 



and. 



lim sup 6n < 



(A. 144) 



Set R = C(0, (5| W, g, A) + 7 for any 7 > and set AU = 
e"^. Clearly, 

liminf- log Af„ = i? > C(0,(5|W,g, A). 

n — *oo 77, 

For any X e 5^ (and its corresponding Z), using Feinstein's 
Lemma flTl we have an (n, A/„, e„)-code satisfying, 

£« < Pr|ii(X";Z") < i? + 7| + e""''. (A.145) 

From the error assumption we see that, 

lim Pr < i? + 7l = 1. (A.146) 

n-+oo n 



This means that, 

i? + 7>T(X;Z), 
and since X G 5^ is arbitrary we have, 

i? + 7 > sup 7(X; Z) 



(A. 147) 



(A. 148) 



limPr<!-loK <ff(Y*) + 27S> ==0 (B.156) 



By ( IB.154I ) we have Hiy:*) = liminf„^oo log I'Pnl and 
from the definition of liminf we may find a subsequence 
indexed by fc„ such that, 

S(Y*) + 27> -^log|PfcJ+7. (B.157) 

For any kn ( IB. 157b holds and we have, 

Pl-iT^log ^TTT^ < T^log'lT'fcJ +7I < 



Applying this result to ( IB. 156b we have. 



For any e > and n greater than some np. 



Pr pj..„(r"") > 



< e. 



(B.159) 



(B.160) 
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Let, For n > hq the probability of the permissible set (in this 

subsequence) is, 



Ak„ = iye : py.„ (F'^") > } , (B.161) 



and for all n > rip, we have pyfc„ (^fc„) < e. 

For n > riQ we may calculate the probabiUty of the 
permissible set (for the subsequence) as, ^ 2-^ Py''^ (y) 



ye-Pfc„nA- 



yePfc„nA|^ ye7'A,„nAfc„ < -75— r 2^ 1 

(B.162b) ' yeP'.nnAi^^ 

< V E ^'?-(y) (B-162C) + ^ ^^■^'"'^^ 

"y.^. y^t 

<e— + . (B.162d) <e— + . (C.170d) 

This shows py-.„ (Vk^ )^ 1 and we have a contradiction showing it is impossible for Y e Tq. ■ 

Appendix C 

Spectral sup-entropy bound Appendix D 

Type Set Size Entropy 

For discrete g = {Vn}^=i with corresponding secure output 
set To, 

1 Theorem D.l: Let (pi,p2, • ■ •) be a sequence of types de 



™rn ^^"^-^^ n ^^"^ fined over the finite alphabet X where p„ e Vn- Assume this 

sequence satisfies the following; 



Proof: Since Y* = {U{rn)}tLi e % we have. 



_ _ I) Pn^P 

sup i?(Y) > i?(Y*) (C.163a) 2) p„ -(^ p, Vn 



lim -\og\T{p^)\=H{p). (D.171) 

n— >oo 77, 



= limsupiloglT'nl (C.163b) 

n — 'oo 

Now assume there exists Y G Tq, with Y = such 
that, 

H{Y) = H{Y* ) + J , (C. 164) Proof: We first show, 

for any 7 > 0. ^ 

This means that, liminf - log \T{p„)\ > H{p). (D.172) 

n— j'OO 77, 

limPrjilog 5^ >i7(Y*) + ^l =0 (C.165) , ^ . ^ ^ . ,. , . . ^ ^ 

ri-»oo 1^ n py„(y") 2 J A sharpening of Stirling s approximation states that tor 

By the definition of lim sup for some subsequence fc„ we i2n+i " 12' 
have, 

-^log|7'fcJ+7>i7(Y*) + J (C.166) n! = ^/2^r^"+^e-"e^". 

^'^'^ Let the empirical distribution, p„ be specified by 

lim Pr ( — log ^_ > — log |-Pfe„ 1+71=0. ("1 ' ■ ■ • ' J- If we enumerate the outcomes as 

Ti^oo \kn pykr^iY^^) kn "J (ai, aif„) we have that. 



For any e > letting. 



A,. = {yeX-: py.^ (y ^" ) < !> (C. 168) 



(C.167) 

, , Pn{a^) = —. 



we may find where for n > no, definition n, = n, and from the above condition 

of absolute continuity we have that /v„ < s{p) for all n, where 
PYkn{Ak„) < e. (C.169) s{p) is the support of the final distribution. 
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log|T(p„)| =log 
= log 



Ut\ ( 



•) 



; log n — rii log rii + log \/2'Kne^ 



-^log(^/2^e^".) 

1=1 

> niJ (p„) - /\„ log (^VSttocA 



[11] A. Feinstein, "A new basic theorem of information theory," IEEE Trans. 

on Information Theory, vol. 4, no. 4, pp. 2-22, Sep. 1954. 
[12] L. M. Marvel, C. G. Boncelet, Jr, and C. T. Retter, "Spread spectrum 
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[13] J. Fridrich and M. Goljan, "Digital image steganography using stochastic 
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This implies that. 



- log |T(p„)| > H (p„) - ^ log {^/2^e^-) . 
Taking the liminf of each side, 

liminf-log|r(p„)| > liminf iJ(p„) = H{p). (D.173) 

n — ^oo Jl n — >oo 

Now we have from the type class upper-bound lfT4l that, 

limsup - log \T{p„)\ < limsupi7(p„). (D.174) 

n — 'oo ^ n — >oo 

Combing with (ID.173I I gives the desired result. ■ 
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